This project examines the relationship between extreme poverty and fertility rates using Gapminder data:
Children per Woman (Total Fertility Rate): The average number of children a woman has.
Extreme Poverty Rate: The percentage of people living below the international poverty line.
Hypothesis:
We hypothesize that higher poverty rates correlate with higher fertility rates due to economic necessity and limited access to contraception. Declining poverty is expected to lower fertility due to improved education, healthcare, and economic opportunities.
Data Cleaning Steps:
Reshaped Data: Converted from wide to long format using pivot_longer().
Merged Datasets: Joined on country and year.
Handled Missing Values: Removed incomplete observations with drop_na().
Considerations:
There is limited historical data. As well, other socioeconomic factors can also impact fertility but are not included.
The animated visualization illustrates how extreme poverty and fertility rates change over time. The animation reveals a general downward trend in both variables, indicating that as time progresses, fertility rates and poverty levels tend to decrease. This supports the hypothesis that economic development and improved access to resources contribute to lower fertility rates.
The scatterplot demonstrates a positive relationship between extreme poverty rates and fertility rates across countries. Countries with higher poverty rates tend to have higher fertility rates, whereas countries with lower poverty rates tend to have fewer children per woman. The spread of points suggests some variability, but the overall trend indicates that poverty has a significant influence on fertility.
The estimated regression equation is: “Fertility_Rate = 2.981 + 0.036 * Poverty_Rate” where 2.981 (intercept) represents the predicted fertility rate when the extreme poverty rate is zero and 0.036 (slope) represents the expected increase in fertility rate for an increase in extreme poverty rate.
The intercept suggests that even in the absence of extreme poverty, fertility rates are still about 3 children. The slope coefficient indicates that as extreme poverty increases by 1 percentage, the fertility rate is expected to increase by 0.036 children per woman.
The proportion of variability (R²) is calculated by 0.3640/0.8558 = 0.425. This suggests that about 42.5% of the variation in fertility rates can be explained by extreme poverty. The remaining variation is most likely influenced by other socioeconomic and cultural factors not included in this model.
The scatterplots comparing observed and simulated fertility rates against extreme poverty suggest that the model shows a general upward trend. Both plots show a clear positive relationship, indicating that higher extreme poverty rates are associated with higher fertility rates. This suggests that the simulated model reasonably approximates the overall trend in the observed data.
The observed data (left plot) shows slightly more variability, with data points distributed more randomly around the general trend line. In contrast, the simulated data (right plot) shows data points closer to the trend. The relatively tighter clustering in the simulated data implies that other factors, like cultural, economic, or healthcare-related variables, affect fertility rates in ways not accounted for by the model.
Generating Multiple Predictive Checks
Show simulated-datasets-regression-comparison code
The histogram displays the distribution of R² values obtained from running regressions on simulated datasets. The R² values center around 0.17, indicating that the model explains around 17% of the variance in the observed relationship between extreme poverty and fertility rates.
The distribution is relatively symmetric and bell-curved, with most R² values between 0.1 and 0.3. While this suggests a moderate correlation, a large proportion of variation remains unexplained.
The relatively low R² values suggest that extreme poverty alone does not fully determine fertility rates. Other socioeconomic, cultural, and policy-related factors likely influence the observed fertility trends.
Conclusion
This analysis explored the relationship between extreme poverty and women’s fertility rates using both observed and simulated data. Our analysis confirms a positive correlation between extreme poverty and fertility rates, with the regression equation Fertility_Rate = 2.981 + 0.036 * Poverty_Rate. The model explains 42.5% of fertility rate variation, indicating that while poverty is a significant factor, other variables contribute to fertility. 1,000 simulated datasets showed R² values around 0.17, further confirming that other socioeconomic, cultural, and healthcare-related factors likely influence the observed fertility trends.
Source Code
---title: "The Effect of Extreme Poverty on Womens Fertility Rates"author: "Ainsley Forster, Sophia Taylor"date: "`r Sys.Date()`"format: htmltheme: cosmoembed-resources: truecode-tools: truetoc: trueeditor: sourceexecute: self-contained: true echo: true code-fold: true message: false warning: false---## Introduction```{r}#| label: load-and-clean-data#| code-fold: true#| code-summary: "Show load-and-clean-data code"library(tidyverse)library(readr)library(gganimate)library(gifski)library(knitr)library(gridExtra)data_children_per_woman <-read_csv("/Users/sophietaylor/Documents/STAT 331/children_per_woman_total_fertility.csv") |>pivot_longer(cols ="1800":"2100",names_to ="year",values_to ="children_per_woman")data_extreme_poverty_rate <-read_csv("/Users/sophietaylor/Documents/STAT 331/gm_epov_rate.csv") |>pivot_longer(cols ="1800":"2100",names_to ="year",values_to ="extreme_poverty_rate")clean_data <- data_children_per_woman |>full_join(data_extreme_poverty_rate, by =c("country", "year")) |>drop_na()```Data and Variables: This project examines the relationship between extreme poverty and fertility rates using Gapminder data: Children per Woman (Total Fertility Rate): The average number of children a woman has.Extreme Poverty Rate: The percentage of people living below the international poverty line.Hypothesis:We hypothesize that higher poverty rates correlate with higher fertility rates due to economic necessity and limited access to contraception. Declining poverty is expected to lower fertility due to improved education, healthcare, and economic opportunities.Data Cleaning Steps:1. Reshaped Data: Converted from wide to long format using pivot_longer().2. Merged Datasets: Joined on country and year.3. Handled Missing Values: Removed incomplete observations with drop_na().Considerations:There is limited historical data. As well, other socioeconomic factors can also impact fertility but are not included.## Data Visualization```{r}#| label: poverty-and-fertility-over-time#| code-fold: true#| code-summary: "Show poverty-and-fertility-over-time code"poverty_and_fertility_over_time <- clean_data |>ggplot(mapping =aes(x = extreme_poverty_rate,y = children_per_woman) ) +geom_point(aes(group =seq_along(year))) +transition_states(year,transition_length =1,state_length =1 )+theme_light()+labs(title ="Extreme Poverty and Womens Fertility Over Time",subtitle ="Fertility Rate (children per woman)",x ="Extreme Poverty Rate (%)",y ="")+geom_text(aes(x =85, y =1.5, label =paste("Year:", year)), size =5)animate(poverty_and_fertility_over_time)```The animated visualization illustrates how extreme poverty and fertility rates change over time. The animation reveals a general downward trend in both variables, indicating that as time progresses, fertility rates and poverty levels tend to decrease. This supports the hypothesis that economic development and improved access to resources contribute to lower fertility rates.```{r}#| label: poverty-vs-fertility-rates#| code-fold: true#| code-summary: "Show poverty-vs-fertility-rates code"clean_data2 <- clean_data |>group_by(country) |>summarise(mean_fertility =mean(children_per_woman),mean_poverty =mean(extreme_poverty_rate) )poverty_vs_fertility_rates <- clean_data2 |>ggplot(mapping =aes(x = mean_poverty,y = mean_fertility) ) +geom_point() +theme_light()+labs(title ="Extreme Poverty vs. Womens Fertility",subtitle ="Fertility Rate (children per woman)",x ="Extreme Poverty Rate (%)",y ="")plot(poverty_vs_fertility_rates)```The scatterplot demonstrates a positive relationship between extreme poverty rates and fertility rates across countries. Countries with higher poverty rates tend to have higher fertility rates, whereas countries with lower poverty rates tend to have fewer children per woman. The spread of points suggests some variability, but the overall trend indicates that poverty has a significant influence on fertility.## Linear Regression```{r}#| label: SLR#| code-fold: true#| code-summary: "Show SLR code"slr_model =lm(mean_fertility ~ mean_poverty, data = clean_data2)print(paste("Fertility_Rate =",round(coef(slr_model)[1], 3),"+",round(coef(slr_model)[2], 3),"* Poverty_Rate") )```The estimated regression equation is: "Fertility_Rate = 2.981 + 0.036 * Poverty_Rate" where 2.981 (intercept) represents the predicted fertility rate when the extreme poverty rate is zero and 0.036 (slope) represents the expected increase in fertility rate for an increase in extreme poverty rate. The intercept suggests that even in the absence of extreme poverty, fertility rates are still about 3 children. The slope coefficient indicates that as extreme poverty increases by 1 percentage, the fertility rate is expected to increase by 0.036 children per woman.## Model Fit```{r}#| label: variation-table#| code-fold: true#| code-summary: "Show variation-table code"var_in_response <-var(clean_data2$mean_fertility)var_in_fitted <-var(fitted(slr_model))var_in_residuals <-var(slr_model$residuals)variation_table <-data.frame("Variance Type"=c("Response", "Fitted", "Residuals"),"Variance Value"=c(var_in_response, var_in_fitted, var_in_residuals) )print(variation_table)```The proportion of variability (R²) is calculated by 0.3640/0.8558 = 0.425. This suggests that about 42.5% of the variation in fertility rates can be explained by extreme poverty. The remaining variation is most likely influenced by other socioeconomic and cultural factors not included in this model.## Visualising Model Simulations```{r, (1,2)}#| label: simulated-data-comparison#| code-fold: true#| code-summary: "Show simulated-data-comparison code"set.seed(33498)simulated_data <- predict(object = slr_model)+ rnorm(192, mean = 0, sd = sigma(slr_model))simulated_poverty_vs_fertility_rates <- clean_data2 |> mutate(simulated_fertility = simulated_data) |> ggplot(mapping = aes(x = mean_poverty, y = simulated_fertility) ) + geom_point() + theme_light()+ labs(title = "Extreme Poverty vs. Simulated Womens Fertility", subtitle = "Simulated Fertility Rate (children per woman)", x = "Extreme Poverty Rate (%)", y = "")grid.arrange(poverty_vs_fertility_rates, simulated_poverty_vs_fertility_rates, ncol = 2)```The scatterplots comparing observed and simulated fertility rates against extreme poverty suggest that the model shows a general upward trend. Both plots show a clear positive relationship, indicating that higher extreme poverty rates are associated with higher fertility rates. This suggests that the simulated model reasonably approximates the overall trend in the observed data. The observed data (left plot) shows slightly more variability, with data points distributed more randomly around the general trend line. In contrast, the simulated data (right plot) shows data points closer to the trend. The relatively tighter clustering in the simulated data implies that other factors, like cultural, economic, or healthcare-related variables, affect fertility rates in ways not accounted for by the model.## Generating Multiple Predictive Checks```{r}#| label: simulated-datasets-regression-comparison#| code-fold: true#| code-summary: "Show simulated-datasets-regression-comparison code"set.seed(33498)generate_and_regress <-function(observed = clean_data2) { simulated_data <-predict(object = slr_model) +rnorm(192, mean =0, sd =sigma(slr_model)) simulated_dataset <- observed |>mutate(simulated_fertility = simulated_data) regression <-lm(simulated_dataset$mean_fertility ~ simulated_dataset$simulated_fertility) summary <-summary(regression)return(summary$r.squared)}simulated_rsquared_values <-tibble(person =1:1000,rsquared =map_dbl(.x =1:1000, .f =~generate_and_regress() ) ) |>ggplot(aes(x = rsquared))+geom_histogram()+labs(title ="Distribution of Regression on Observed vs. Simulated Datasets",subtitle ="Counts",x ="R-Squared Values",y ="")+theme_light()plot(simulated_rsquared_values)```The histogram displays the distribution of R² values obtained from running regressions on simulated datasets. The R² values center around 0.17, indicating that the model explains around 17% of the variance in the observed relationship between extreme poverty and fertility rates.The distribution is relatively symmetric and bell-curved, with most R² values between 0.1 and 0.3. While this suggests a moderate correlation, a large proportion of variation remains unexplained.The relatively low R² values suggest that extreme poverty alone does not fully determine fertility rates. Other socioeconomic, cultural, and policy-related factors likely influence the observed fertility trends.## ConclusionThis analysis explored the relationship between extreme poverty and women’s fertility rates using both observed and simulated data. Our analysis confirms a positive correlation between extreme poverty and fertility rates, with the regression equation Fertility_Rate = 2.981 + 0.036 * Poverty_Rate. The model explains 42.5% of fertility rate variation, indicating that while poverty is a significant factor, other variables contribute to fertility. 1,000 simulated datasets showed R² values around 0.17, further confirming that other socioeconomic, cultural, and healthcare-related factors likely influence the observed fertility trends.